In a fast-moving digital landscape where readers skim headlines and scroll through feeds, only certain news articles manage to break through the noise and capture attention. These news articles grow popular not just from getting read but by getting shared, liked, and circulated online. The number of shares an article receives can determine how far it travels and how much influence it holds. For readers, this means the stories that appear at the top of their feed often guide what they see and think about. For editors and writers, it means understanding what drives virality is essential to keeping journalism relevant in an online environment. This project explores how features like article length, multimedia, sentiment, and topic relate to article shares, offering insights into the types of pieces that become popular.
I became interested in the analytics of digital journalism from my experiences working on The Cavalier Daily, UVA’s independent student newspaper. As a writer and editor, I saw how The Cavalier Daily’s social media presence influenced its views and audience engagement, as well as how popular articles shaped opinions and daily conversations across UVA. I grew curious about which features of an article contribute to the article’s circulation. For example, I wondered whether an article’s structure (e.g., length) affected its popularity more than its content (e.g., topic).
In this project, I use a range of data visualizations to explore how structural- and content-based factors relate to article popularity, measured by the number of shares an article gets. The quantitative variables in this analysis include the number of shares, title sentiment, content sentiment, polarity, subjectivity, and number of keywords. The qualitative variables include the day of publication, article topic, presence of videos, presence of images, and keyword type. I initially use simple data visualizations to explore three main features – time of publication, article length, and article topic – then expand this exploration to other features, such as article subjectivity and use of keywords. This project uses metadata of 500 online articles from Mashable, a digital media platform, collected over the span of two years (Fernandes et al., 2015).
The scatter plot shows the relationship between publication day and article shares, allowing users to explore whether publication timing may influence article popularity. By depicting the spread of articles published on each day, the scatterplot makes it clear whether certain days have highly variable performance versus more consistent engagement. The average number of shares for each day is depicted by the yellow points, and the value of the highest average is annotated and marked by a horizontal line. These markers combine individual-level detail with summary statistics, encouraging users to explore within-day distributions and make comparisons across days of the week.
The scatter plot reveals that the spread of article shares is fairly similar across all seven days, with Tuesday showing the most spread. The higher concentration of points on Tuesday, Wednesday, and Thursday suggests more articles were published on weekdays. The yellow points show that articles published on Saturday have a higher average share count than other days of the week. While this difference is modest, it suggests that publication day may a relevant feature contributing to article popularity.
This animated bar chart loops between two metrics – average shares and total shares – revealing how these values compare across article topics. The purpose of the bar chart differs from the scatter plot, which focuses on individual article-level variation and distributional spread. Here, aggregated visualizations highlight broad patterns across article topics, allowing users to observe between-category differences.
The animation reveals a key difference in average shares and total shares; in particular, lifestyle articles tend to generate significantly higher average shares compared to others. However, these articles have the lowest sum of shares. This difference suggests that individual Lifestyle articles may resonate strongly with readers and lead to high shares per articles, but overall, the Lifestyle aggregated shares perform worse than articles from other newspaper sections. The contrast across categories reinforces the importance of topic selection as a key determinant of article popularity.
Article titles serve as the first point of contact for readers and often determine whether an article is clicked or shared. Understanding how title length relates to share count can reveal whether readers prefer concise headlines or more descriptive ones, and whether length plays a measurable role in article popularity. Additionally, longer articles may provide more depth, while shorter ones may hold attention better. Analyzing the relationship between content length and shares helps determine whether readers tend to share articles that are brief and digestible or more comprehensive in scope.
This shiny app (https://miartan.shinyapps.io/stat3280app/) explores how the length of article title and content relate to shares. The scatterplot shows no strong linear relationship between title length and shares; articles with both short and long titles appear across the full range of share counts. The LOESS curve suggests a subtle increase in shares for moderately long titles, but this effect is weak and overshadowed by the high variance. This implies that title content likely matters more than length alone. Length may help shape clarity or tone, but it does not consistently predict article popularity on its own.
The scatterplot also reveals wide variation in shares regardless of article length. The LOESS smoothing line slightly increases for mid-range content lengths, suggesting that extremely short or extremely long articles may be less likely to go viral. However, the overall weak relationship indicates that article length by itself is not a primary driver of popularity. Content richness, clarity, and topical relevance are likely more important than raw word count.
The presence of multimedia often influences how readers perceive and engage with online articles. Images can improve comprehension and increase click-through likelihood, especially on fast-moving platforms. Videos are also commonly used to increase engagement, capturing attention but also requiring more viewer commitment. Examining how images and videos correspond with sharing behavior can reveal whether these two forms of multimedia contribute to virality and whether their influence is synergistic.
The first set of boxplots compares articles containing at least one image to those containing none. Articles with images tend to show a wider spread of share values, including higher outliers. This suggests that images may help amplify the reach of articles that already perform well. However, the presence of images does not guarantee high engagement across the board, as low-share articles exist in both categories. Overall, the pattern implies that images may act as a supportive feature that boosts already compelling content rather than driving popularity on their own.
The second set of boxplots is the distribution of shares across articles with and without embedded videos. The presence of videos does not appear to drive article popularity, as video-less and video-rich articles show a similar spread and median share count. Overall, while images and video may appeal to users, their presence does not broadly enhance shares.
Across all analyses, the factors that most strongly relate to article shares are topic, keyword performance, and emotional tone. Certain topics – especially Lifestyle, World, Business, and Tech – consistently achieve higher engagement, suggesting that subject matter drives much of the variability in popularity. Articles that contain strong, high-impact keywords also show substantially higher share counts, particularly within news-oriented domains. Emotional characteristics such as sentiment polarity and subjectivity contribute meaningfully as well, but more as amplifiers than primary drivers: articles with moderate polarity and moderate subjectivity tend to perform best. In contrast, structural attributes such as title length, content length, and multimedia presence show weaker and more inconsistent relationships with shares. Overall, the results indicate that what an article is about and how well its themes align with audience interest matter far more than stylistic or structural features, emphasizing that topical relevance and keyword strength are the key predictors of article popularity.
Fernandes, K., Vinagre, P., & Cortez, P. (2015). A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News. UCI Machine Learning Repository: Online News Popularity Dataset.